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The  bootstrap  and  smoothed  bootstrap  are  considered  as  alternative  methods  of 
estimating  properties  of  unknown  distributions  such  as  the  sampling  error  of  parameter 
estimates.  Criteria  are  developed  for  determining  whether  it  is  advantageous  to  use  the 
smoothed  bootstrap  rather  than  the  standard  bootstrap.  Key  steps  in  the  argument 
leading  to  these  criteria  include  the  study  of  the  estimation  of  linear  functionals  of 
distributions  and  the  approximation  of  general  functionals  by  linear  functionals. 
Consideration  of  an  example,  the  estimation  of  the  standard  error  in  the  variance- 
stabilized  sample  correlation  coefficient,  elucidates  previously-published  simulation 
results  and  also  illustrates  the  use  of  computer  algebraic  manipulation  as  a  useful 

technique  in  asymptotic  statistics.  Finally,  the  various  approximations  used  are 

r  > 

vindicated  by  a  simulation  study.  {  £"«?  .  i 
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1.  INTRODUCTION 


1.1  The  standard  bootstrap 

The  bootstrap  is  an  appealing  non-parametric  approach  to  the  assessment  of  errors  and  related 
quantities  in  statistical  estimation.  The  method  is  described  and  explored  in  A*ait  by  Efron  (1979, 
1982).  A  typical  context  in  which  the  bootstrap  is  used  is  in  assessing  the  sampling  mean  square  error 
a (F)  of  an  estimate  §(XU...J(,)  of  a  parameter  0(f)  based  on  a  sample  Xt drawn  from  an 
unknown  distribution  F .  if  F  were  known,  a  might  be  most  easily  estimated  by  repeatedly  simulating 
samples  from  F.  The  standard  bootstrap  technique  is  to  estimate  off)  by  the  sampling  method,  but 
with  the  samples  being  drawn  not  from  F  itself  but  from  the  empirical  distribution  function  F,  of  the 
observed  data  Xj  A  sample  from  F„  is  generated  by  successively  wilting  uniformly  with 

replacement  from  to  construct  a  bootstrap  sample  {X*i . X*.}.  For  each  bootstrap 

sample,  the  estimate  6(X*i,...,X*a)  of  the  quantity  0 (F„)  is  calculated.  Since  arbitrarily  large  numbers 
of  bootstrap  samples  can  be  constructed,  a(Fm)  can  easily  be  estimated  to  any  reasonable  required 
accuracy  from  the  simulations.  The  quantity  ct(fa)  is  then  used  as  an  estimate  of  off). 

The  bootstrap  methodology  thus  consists  of  two  main  elements,  which  are  often  confused.  There 
is  firstly  die  idea  of  estimating  a  functional  ct(F)  by  its  empirical  version  a (F.)  and  secondly  the 
observation  that  off.)  can  in  very  many  contexts  be  constructed  by  repeated  resampling  from  the 
observed  data.  The  resampling  idea  is  an  extremely  important  one,  but  it  has,  perhaps  been 
overstressed  at  the  expense  of  the  underlying  estimation  step.  Once  the  two  steps  are  corceptually 
separated  it  becomes  easier  to  gain  a  fuller  understanding  of  how  the  bootstrap  actually  works.  In 
particular  it  becomes  clear  that  there  is  nothing  special  about  estimating  functionals  a (F)  that  are 
themselves  sampling  properties  of  parameter  estimates;  the  bootstrap  idea  can  be  applied  to  any 
functional  off)  of  interest 

12  The  smoothed  bootstrap 

Because  the  empirical  distribution  F,  is  a  discrete  distribution,  samples  constructed  from  F.  in 
the  bootstrap  simulations  will  have  some  rather  peculiar  properties.  All  the  values  tatrm  by  the 
members  of  the  bootstrap  samples  will  be  drawn  from  the  original  sample  values,  and  nearly  every 
sample  will  contain  repeated  values.  The  smoothed  bootstrap,  suggested  by  Efron  (1979),  is  a 
modification  to  the  bootstrap  procedure  to  avoid  samples  with  these  properties.  The  idea  of 

the  smoothed  bootstrap  is  to  perform  the  repeated  sampling  not  from  F.  itself,  but  from  a  smoothed 
version  /  of  F„.  Two  possible  versions  of  the  smoothed  bootstrap  will  be  described  in  more  detail 
below;  whatever  method  of  smoothing  is  used,  the  net  effect  of  using  the  smoothed  bootstrap  is  to 
estimate  the  functional  a (F)  by  a(/). 

The  main  aim  of  this  paper  is  to  investigate  some  properties  of  the  smoothed  bootstrap,  in  order 
to  give  some  insight  into  circumstances  when  the  smoothed  bootstrap  will  give  better  results  than  the 
standard  bootstrap.  As  an  important  by  product,  the  value  of  computer  algebraic  manipulation  as  a  tool 
in  asymptotic  statistics  will  be  demonstrated. 

Efron  (1982)  considered  the  application  of  the  bootstrap,  and  various  other  techniques,  to  the 
estimation  of  the  standard  error  of  the  variance-stabilized  transformed  correlation  coefficient.  He 
illustrated  by  direct  simulation  that  in  a  particular  case  a  suitable  smoothed  bootstrap  gave  better 
estimates  of  standard  error  than  the  standard  bootstrap.  We  shall  discuss  Efron’s  example  later  in  the 
paper  and  demonstrate  how  his  results  can  be  elucidated  and  extended  by  using  a  suitable 
approximation  argument. 

Before  going  on  to  discuss  the  estimation  of  general  functionals  a(F),  we  shall  first  consider  the 
estimation  of  functionals  a  that  are  linear  in  F.  For  such  functionals  we  shall  obtain  simple  sufficient 
conditions  under  which  using  the  smoothed  bootstrap  can  decrease  the  mean  square  error  in  the 
estimation  of  a (F). 

We  close  this  section  by  giving  details  of  the  two  kinds  of  smoothed  bootstrap  considered  in  later 
discussion.  Suppose  Xt  ,...X.  is  a  set  of  r-dimensional  observations  drawn  from  some  r-variate  density 


iuwwi 


•  2- 


/  and  chat  V  is  the  variance  matrix  of  />  or  a  consistent  estimator  of  this  variance  matrix,  such  as  (he 
sample  variance  matrix  of  the  data.  Choose  a  kernel  function  K  such  that  AT  is  a  symmetric  probability 
density  function  of  an  r-variate  distribution  with  unit  variance  matrix,  for  example  the  standard  unit  r- 
variate  normal  density. 

Define  the  kernel  estimate  /*(x)  of  /(x)  by 


Aw  =  ivrw«-,*-’2i  v-V-*) } 

l  the  shrank  kernel  estimate  fk^(x)  by 

A*W  =  (1 +*I)i7*{tf+Az)M  • 


(1.1) 


(L2) 

Density  estimates  in  general  are  discussed,  for  example,  by  Silverman  (1986).  The  smoothing 
parameter  h  determines  the  amount  by  which  the  data  are  smoothed  to  provide  estimates.  Estimates  of 
the  form  (1.2)  have  the  property  that  the  density  A,*  has  the  same  variance  structure  as  the  original 
data,  if  V  is  taken  to  be  the  sample  variance  matrix. 

Given  any  functional  ct(F)  of  an  r-variate  distribution  F,  the  unshrank  smoothed  bootstrap 
estimate  of  a(F)  is  defined  to  be  a (Fk)  and  the  shrank  smoothed  bootstrap  estimate  is  a (F*fJ),  where 
F*  and  Fkr,  are  the  distribution  functions  corresponding  to  A  and  /*,,  respectively.  It  is  easy  to 
simulate  either  from  A  or  from  fhtt  by  sampling  with  replacement  from  the  original  data  and  perturbing 
each  sampled  point  appropriately;  for  details  see  Efron  (1982)  or  Silverman  (1986,  Section  6.4).  Hence 
values  of  a(Fk)  and  a(Fkj)  can  be  obtained  in  practice  by  simulation  if  necessary. 


Z  UNEAR  FUNCTIONALS 

In  this  section  we  consider  the  estimation  of  a  linear  functional  A(F). 
standard  calculus  demonstrates  the  existence  of  a  function  a(t)  such  that 

A(F)«  Je(/)dF(0  • 

The  standard  bootstrap  estimate  Aq(F)  will  satisfy 

Ao(F)  -  A(F.)  -  Ja(r)dF.(t)  »  ir'tafX,)  . 


i-l 


Justification. 


By- 


Distribution/ 


Availability  Codes 


Dlst 


81 


Avail  and/or 
Special 


The  unshnink  smoothed  bootstrap  estimate  Ak(F)  will  satisfy 

4(F)»J«(0A(0<* 

and  the  shrunk  smoothed  bootstrap  estimate  AM(F)  will  satisfy 

4,(F)  »  Ja(0/*^(0dt 

with  /k  and  A^  as  defined  in  (1.1)  and  (12)  above. 

In  the  discussion  that  follows  we  shall  assume  that  the  function  a  has  continuous  derivatives  of  all 
orders  required.  AH  unspecified  integrals  are  taken  over  the  whole  of  r-dimensional  space.  Assume 
that  V  is  fixed  and  define  the  differential  operator  Dy  by 

Dya  *  £  2  a Jto&h  • 

i»l  /■! 

Our  first  theorem  gives  a  criterion  for  smoothing,  without  shrinkage,  to  be  of  potential  value  in 
the  bootstrap  estimation  process. 

THEOREM  1 

Suppose  a(X)  and  Dva(X )  are  negatively  correlated.  Then  the  mean  square  error  of  Ak(F )  can  be 
reduced  below  that  of  A<t(F)  by  choosing  a  suitable  h  >  0. 


Because  A  is  linear,  >r 
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Proof 


Assume  without  loss  of  generality  that  A(F)  =  0,  by  replacing  a(t)  by  a(t)-Ja(x)/(x)dx  if  necessary. 
By  this  assumption, 

MSE{A*(F)}  =  E{A»(F)2}  =  var{A*(F)}  +  [F{A*(F)}]2  . 

„  A 

Now,  by  some  easy  manipulations,  Ak{F)  -  iTl  £w(Xi),  say,  where 

ml 

w(x)  =  la(t)h-'\V\-*K{h-'  V*(r-x)}  dl  =  jK®a(x  +  hV*$)  d% 
on  making  the  substitution  t=x+hV*^  . 


A  Taylor  expansion  gives 

a(x+hV*Q  =  a(x)  +  A(V'*5)TVa(x)  +  >/4 A2(V*S)T//.(x)(V*5)  +  0(A4) 
where  Hm(x)ij=<?’a{x)ldxi<}xj. 


By  our  assumptions  on  the  kernel  K  it  follows  that 


Then 


w(x)  a  a(x)  +•  \/ih1Dva(x)  +  0(A4)  . 


03) 


*  £{w(X)}  *  'Ah1\f{x)Dva{x)  dx  +  0{h*),  (2.4) 

since  Ja(x)/(x)  *«0.  Also,  since  Xt  are  independent, 

«  var{A*(/')}  =  var{ w(X)}  »  Ja(x)2/(x)  d*+A2Ja(x)DKa(x)/(x)  dx+0(h*),  (25) 

using  (23).  Combining  (24)  and  (25)  gives  the  mean  square  error 

MSE[A*(f )}  «  n~lJa(x)x/(x)dx  +  k1ja(x)Dva(x)f(x)  dx  +  0(Ji*).  (26) 

For  fixed  n,  die  equatioa  (26)  demonstrates  Out,  under  the  assumption  that  a(X)  and  Dya{X)  are 
negatively  correlated,  the  mean  square  error  of  A»(F)  will,  at  least  for  small  h,  be  smaller  than  that  of 
Ao(F),  completing  the  proof  of  the  theorem. 

The  next  theorem  gives  the  corresponding  criterion  for  smoothing  with  shrinkage  to  lead  to  more 
accurate  bootstrap  estimation.  Define  a*(X)  by 


a*(X)  «  Dya(X)  -  X.Va(X). 


THEOREM  2 

Suppose  a(X)  and  o*(X)  are  negatively  correlated.  Then  the  mean  square  error  of  Ak<J(F)  can  be 
reduced  below  that  of  Aj^(F)«Ao(F)  by  choosing  a  suitable  h  >  0. 

Proof 

As  before,  assume  without  loss  of  generality  that  A(F)  =0 .  We  have  by  similar  manipulations  to 
those  used  above, 

AhAf)  *  *■' 

t-i 


where 


w*(x)  »  (l+A2)*Ja(/)*-rlVT*X[h~lV'-*{x  -  (l+h2)wt}]dt 

-  J«{(l+h2rM(x+^Vi4)}^)^  . 
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on  making  the  substitution  t  =  (x  +  hV*%)/{l  +  It2)*  Now,  for  A  small,  (1  +  A2)-'4  »  1  -  viA2,  so 

w*{x)  =  \a(x  +  AV*5  -  v&ft*x)*(t)rfg. 

A  Taylor  expansion  of  a  about  x,  and  our  assumptions  on  the  kernel  K  give 

w*(x)  =  a(x)  +  V*A2a*(x)  +  0(A4)  (2.7) 

Now  we  have 

EtikAn}  =  £{h-*(X)}  =  ‘AA2J/(x)a*(x)dx  +  0(A4), 

and,  on  using  (2.7), 

n  var{AM(F)}  =  ja(x)2/(x)  dx  +  A 2Ja(x)a*(x)/(x)  dx  +  0(A4). 

Hie  proof  of  Theorem  2  is  completed  in  the  same  way  as  that  of  Theorem  1. 

As  a  simple  illustration,  consider  the  estimation  of  the  sixth  moment  Jx£/(x)dx  of  a  univariate 
density.  It  is  not  immediately  clear  whether  or  not  smoothing  is  worthwhile  in  this  case.  In  the 
notadon  used  above,  a(x)  =  x6  ,  Dva(x)  =  30Vx*  and  a*(x)  =  30 Vx4  -  6xs.  It  follows  that,  setting 
Mr  =  EX', 

cov{a(X),  a*(X)}  *  -6(ti2  +  30 +  6n|  -  3 0K(i«p«. 

If,  for  example,  X  has  a  normal  distribution  with  mean  zero  and  variance  V,  we  have 
Muj  =  ViTT‘Q.j)\l j\  and  hence 

cov{a(X),  a*(X)}  =»  -  34020  V4  <  0. 

Thus  a  shrunk- smoothed  estimate  jx6/ktt(x)dx  will  always,  for  a  suitably  chosen  value  of  A,  give 
a  more  accurate  estimate  of  EX6  than  will  the  raw  sixth  moment  if  X  is  drawn  from  a  normal 
distribution.  Similar  calculations  for  other  distributions  show  that  the  same  mwrhmnn  holds  'i~W  a 
wide  variety  of  distributional  assumptions  for  X. 

The  results  obtained  by  applying  the  criteria  can  sometimes  be  a  little  surprising.  Suppose  X  is 
drawn  from  a  standard  normal  distribution.  Application  of  the  criterion  for  estimation  by  nnrfimnir 
smoothing  demonstrates  that,  far  small  A,  this  will  have  a  deleterious  effect  in  the  0f  rithw 

EX*  or  EX 2  alone.  However,  for  the  linear  combination  of  moments  E(X*  -  cX2),  unshnmk 
smoothing  will  be  worth  performing  provided  e  >  6.  Details  of  this  somewhat  counter-intuitive  result 
are  left  to  the  reader  to  reconstruct. 

We  do  not,  in  this  paper,  devote  much  attention  to  the  question  of  how  much  smoothing  should 
be  applied  in  cases  where  smoothing  is  worth  performing;  that  problem  is  left  far  future  work. 
However,  the  last  example  of  this  section  demonstrates  that  the  question  of  how  much  to  smooth  can  be 
a  rather  delicate  one.  In  this  example,  let  denote  the  density  of  the  normal  distribution  with  mean 
zero  and  variance  o2.  Let 

A«(£) «  /  *(r)df(0, 

and  suppose  dm  the  quantity  e  converges  to  zero  as  the  sample  size  increases.  Assume  that  £  has  a 
smooth  density  /  with  derivatives  of  all  orders  required.  Consider  the  estimation  of  A, (£)  by  the 
unshnmk  smoothed  estimator  A*(F),  constructed  using  the  normal  density  as  the  kernel.  We  shall 
investigate  the  optimal  large-sample  behaviour  of  the  smoothing  parameter  A.  Assume  throughout  that  A 
is  small  for  large  n  and  that  /(0)  >  0. 

Setting  and  performing  some  simple  manipulations  gives 

A»(F)  -  J  *(»)/*(»)*  -  »'1  I*(X(). 

4 

Hence,  substituting  u  ■  t8  and  performing  a  Taylor  series  expar  vm. 
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EA*(f)  =  J<Ps(0/(0<ft  *  JqKu)/(«S)<^  *  /(O)  +  >/4SV"(0)  +  0(54). 

Since,  by  a  similar  argument, 

AeCO  =  Jq»»(t)/(r)dt  =  m  +  ‘Ae2/"  (0)  +  0(e4), 

it  follows  tbat 

SAsfFJ-A.fF)  *  •/tA2T'(0)  +  O(S4). 

By  standard  arguments 

var{A*(F)}  =  /.“*  var{q>a(X)}  =  «-1/(0)/(25jti4){l +  0(5)}. 

Thus  the  mean  square  error  of  Ak(F)  will  be,  asymptotically,  given  by 

MSE{A\(F)}  »  a'7( 0)/(2&t*)  +  V*h*r'(0)2 
■  n“7(0)/(25n,/S)  +  'A(82-e2)2T(0)2  , 

where  the  terms  neglected  are  of  order  a-1  +  56.  This  approximate  mean  square  error  is  a  convex 
function  of  5,  and  its  minimizer  will  satisfy 

« 

5,(SI-ei)  -  C(f)n-\  where  C(f)  *  /(0)/{2rty,/"(0)1}. 
or,  in  terms  of  h  and  e, 

(1  +  hllt?)yihllt1  -  C(f)n~lt~s.  (ZS) 

Denote  by  y(J?)  the  root  in  (0,  «• )  of  the  equation 

(1  +  ty1)3/ly2  *  H; 

then  by  simple  calculus  \f{R)  -  R*  as  R-* 0  and  tp(F)  -  RM  as  /?-*  •*.  The  asymptotically  optimal 
A  for  the  estimation  of  At  will  satisfy,  from  (2.8), 

Km  *  etr{C(/)«_Vs}- 

If  then 

Km  -  tC(f)P  »"**€*  -  C(/)w«-w. 

Standard  density  estimation  theory  (Parzen,  1962)  shows  that  this  is  the  asymptotically  optimal 
smoothing  parameter  for  the  estimation  of  the  density  at  zero.  Thus,  in  this  case,  the  best  estimate  of 
A,  will  be  based  on  the  best  estimate  of  the  density. 

Unfortunately  this  will  by  no  means  always  be  the  case.  If  /i~,e~s-+0,  we  win  have 
h^-t  e'**  -  C(/)HA"'*e~M 

and  if  n_le~5  — >a,  where  0  <a  <  ••, 

Km  - 

In  neither  of  these  cases  will  it  be  optimal  to  construct  an  optimal  estimate  of  /  in  order  to 
estimate  A,(/),  since  the  optimal  choice  of  A  will  be  smaller,  in  order  of  magnitude  in  the  first  case, 
than  that  required  for  the  estimation  of  /  itself.  Thus  the  optimal  estimate  of  A*(F)  will  be  based  on 
an  undersmoothed  estimate  of  the  underlying  density.  This  example  is,  of  course,  rather  artificial,  but  it 
does  illustrate  the  likely  difficulty  of  obtaining  general  rules  for  deciding  how  much  to  smooth  when 
estimating  functionals  of  a  density.  Even  in  cases  where  smoothing  is  advantageous,  the  amount  of 
smoothing  required  may  be  quite  different  from  that  needed  for  the  estimation  of  the  density  itself. 


3.  MORE  GENERAL  FUNCTIONALS 
3.1  Linear  approximation 
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In  this  section,  the  work  of  Section  2  is  extended,  by  considering  local  linear  approximations,  to 
more  general  functionals  of  an  unknown  distribution.  When  an  explicit  bootstrap  method  is  being  used 
the  functional  being  estimated  is  unlikely  to  be  linear,  and  so  a  more  general  theory  is  necessary.  Local 
linear  approximations  to  functionals  of  distributions  have  also  been  used  by  Hinicley  and  Wei  (1984) 
and  Withers  (1983). 

Consider  the  estimation  of  a  functional  a(F0)  of  an  unknown  distribution  F0  underlying  a  set  of 
sample  data.  Suppose  that  a  admits  a  linear  von  Mises  expansion  about  Fq  given  by 

aiF)  =  ct(F0)  +  A(F~F0 )  (3.1) 

where  die  linear  functional  A  is  representable  as  an  integral 

A(F-F0 )  =  j  a(l)d(F-F0XO.  (32) 

A  detailed  discussion  of  differentiation  of  functionals  and  general  von  Mises  approximation  is  given  by 
Femhoiz  (1983).  The  precise  accuracy  of  the  expansion  (3.1)  depends  on  the  detailed  properties  of  a, 
but  the  error  will  in  general  be  of  order  sup  |F  -F0 |2. 

The  expansion  (3.1)  gives  an  obvious  approximation  to  the  bootstrap  estimate  of  a(F0).  If  F  is 
an  estimate  of  Fo,  then  we  will  in  general  have,  provided  sup  \F  -  Fa  |  is  0,(n~'4),  < 

a </)  »  <KF0)  +  A(F)  -  A(Fq)  +  0,(n~') 

and  so  the  sampling  properties  of  a (F)  will  be  approximately  the  same  as  those  of  A(F).  The  criteria 
of  Section  2  can  then  be  applied  to  the  linear  functional  A.  If  using  an  appropriate  smoothed  bootstrap 
will  improve  die  estimation  of  A(Fo)  then,  neglecting  any  errors  inherent  in  the  linear  approximation 
(3.1),  the  smoothed  bootstrap  will  be  worth  using  in  the  estimation  of  a(F0). 

32  The  transformed  sample  correlation  coefficient 

In  this  section  we  consider  application  of  the  linear  approximation  procedure  to  estimation  of  the 
sampling  standard  deviation  of  the  variance-stabilized  sample  correlation  coefficient  Suppose  F0  is  a 
bivariate  distribution  with  mean  zero  and  correlation  coefficient  p,  and  let  ^tanh''p.  Let  r  be  the 
computed  sample  correlation  coefficient  based  on  a  sample  of  n  independent  observations  from  F0,  and 
let  z=tanh"lr  be  the  sample  estimate  of  C  Then  the  functional  of  interest  is  a.(F0)=(var  z)M.  Efron 
(1982)  devoted  considerable  attention  to  the  estimation  of  a^Fo)  by  a  variety  of  methods,  mrimting  the 
smoothed  bootstrap,  for  the  specific  case  of  F0  bivariate  normal,  with  marginals  of  unit  wiance  and 
p  =  56,  and  for  sample  size  n  *  14. 

A  key  step  in  our  investigation  of  the  estimation  of  ct,(F0)  will  be  an  approximae  formula,  given 
by  Kendall  and  Stuart  (1977,  p236).  Let 

a(F0)-[p2(l-pV  OWHu 1 + V*  0Lo/W1+  Uot/iht'+tyn/toolim) 

-(Hii/liu4»+lAij/4nlJ«)}]V‘  (3-3) 

where  is  the  (i,j)  moment  given  by 

P*  *  j*ydFo(Jt). 

Here  and  subsequently  in  this  section  unsubscripted  letters  x  will  denote  vecton  (x,  ^).  Kendall  and 
Stuart  give 

On(F0)  -  e-^cKFo)  +  0(n-»), 

so  that  estimation  of  ct n(F9)  is  approximately  equivalent  to  that  of  a(Fo). 

Consider  now  the  calculation  of  the  function  a(r)  defined  in  (3.2).  Far  fixed  r  let  S,  be  the 
distribution  function  of  a  point  mass  at  t  and,  for  any  e  >  0  let  Ft  be  the  improper  distribution  Fo+e5,. 
Then  simple  calculus  combining  (3.1)  and  (32)  gives 

«(0  -  (dfdtyUFJ^  (3.4) 
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Our  functional  a(F)  is  defined  for  improper  distributions,  as  well  as  for  probability  distribution 
functions,  and  hence  there  is  no  need  when  calculating  a(t)  to  consider  the  more  complicated 
perturbation  e(5,-F0)  to  F0  used  by  Hinkley  and  Wei  (1984).  The  actual  algebraic  manipulations 
required  in  the  calculation  of  a(r)  from  (3.4)  and  (3.3)  are  extremely  laborious.  However,  it  is 
relatively  easy  to  write  a  program  in  a  computer  algebraic  symbolic  manipulation  language,  such  as 
MACSYMA,  to  perform  the  necessary  differentiations  and  substitutions.  The  function  a(t)  itself  is  a 
fourth  order  polynomial  in  tt  and  t2  whose  coefficients  depend  on  the  moments  of  F0.  It  is  only  used 
as  an  intermediate  step,  in  the  special  cases  considered  below,  in  the  calculation  of  the  criteria  derived 
from  Theorems  1  and  2,  and  the  calculation  of  these  criteria  was  also  performed  by  computer  algebra. 
Further  details  of  the  manipulations  are  available  from  the  authors. 

To  complete  this  section  we  consider  the  results  of  (he  application  of  the  computer  algebraic 
manipulation  procedure  to  the  functional  (3.3)  for  two  special  cases.  Further  details  of  the  results 
discussed  will  be  given  in  Section  3.3  below.  Let  AU(F0)  be  the  criterion  obtained  from  Theorem  2 
for  the  shrunk  smoothed  bootstrap  to  be  advantageous  in  the  estimation  of  the  functional  A(F0).  Recall 
that  Asa(F0)<0  means  that  some  smoothing  at  least  is  worthwhile. 

Suppose,  first,  that  the  distribution  of  the  data  can  be  reduced  by  an  affine  transformation  to  a 
radially  symmetric  distribution  Ff.  Without  loss  of  generality  it  can  be  assumed  that  Fr  has  unit 
marginal  variances.  Let  R  be  the  radial  component  of  F\  and  denote  by  s}  the  jth  central  moment  of 
R1.  Computer  algebra  shows  that  the  criterion  Aa(F0)  reduces,  in  this  case,  to 

AU(F0)  =  -  {3r«  +  (4-3*)*  +  si  +  2s}  +  24s*  +  16}/32.  (33) 

where  Pa  is  the  positive  quantity  VtO(.Fa)~x.  Using  the  standard  inequality  **  £**,  we  have 

-32Aja(F0)  ^  3*  -  4 *4 '**,'*  -  3*4H*^a  +  si  +  2sf  +  24*  +  16 

=.  3(*4*  -  Vi**1  -  2*H/3)*  +  Vi  si  +  68*/3  +  16  J>  16. 

This  gives  the  general  conclusion  that  AU(F0)  <  -  Vi  for  any  distribution  F0  which  can  be  affinely 
transformed  to  radial  symmetry. 

Another  class  of  distributions  for  which  Att(F0)  is  guaranteed  not  to  be  positive  in  the  class  for 
which  a  particular  affine  transformation  of  Fq  to  unit  variance/covariance  matrix  yields  a  distribution 
with  independent  marginals.  Let  X  be  a  random  vector  with  distribution  Fa,  and  let 
of  =  var  X\,  o|  =*  var  Xz  and  p = canfXi  ,Xg)-  Define  a  matrix  S  by 


here  the  power  Vi  denotes  the  symmetric  positive-definite  square  root  Define  a  bivariate  distribution 
F*  by  F*(m)*Fo(5’«)  for  u  in  R1.  A  random  vector  Y*S~'X  with  distribution  F*  and  unit 
variance/covariance  matrix  can  be  obtained  first  by  rescaling  the  marginals  of  X  to  have  unit  variance 
and  then  by  rescaling  the  principal  components  of  the  resulting  vector  to  have  unit  variances.  If  this 
natural  affine  transform  of  Fa  has  independent  marginals,  then  an  argument  given  in  Section  33  below 
demonstrates  that  Aj*(F0 )  1  0,  with  equality  only  if  X  has  a  uniform  discrete  distribution  giving 
probability  Vi  to  each  of  four  points. 

In  summary,  we  have  derived  the  following  conclusion.  Provided  all  the  approximations  we  have 
made  are  reasonable,  using  a  shrunk  smoothed  bootstrap,  with  an  appropriate  smoothing  parameter,  will 
give  improved  estimation  of  a,(F0)  over  that  obtained  by  the  standard  bootstrap,  if  either  F0  is  an 
affln«  transformation  of  a  radially  symmetric  distribution  or  F0  is  an  affine  transformation,  of  a 
particular  kind,  of  a  distribution  with  independent  marginals  and  F0  is  not  a  uniform  four-point  discrete 
distribution.  In  practice  the  underlying  distribution  Fa  will  not  be  known.  An  obvious  topic  for  future 
investigation  is  the  construction  of  empirical  versions  of  the  criteria  of  Theorems  1  and  2,  on  the  basis 
of  which  a  decision  whether  or  not  to  smooth  can  be  made  for  each  data  set  encountered.  Some 
preliminary  simulations  along  these  lines  have  been  encouraging. 
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3.3  Some  technical  details 

Throughout  this  section,  define  the  matrix  5  as  in  (3.6),  and  suppose  that  X  is  a  random  vector 
with  distribution  Fa-  Let  Y  =  S~'X  as  in  Section  3.2,  and  let  F*(y)  =  Fa(Sy)  be  the  distribution  of  Y. 

It  is  easily  seen  that  the  existence  of  an  affine  transformation  reducing  F0  to  radial  symmetry  is 
equivalent  to  the  radial  symmetry  of  the  particular  affine  transformation  F*. 

Define  as(u)  =  a(Su)  and  let  *,y  =  EY\  Y{,  where  Y  =  S~'X.  In  both  of  die  two  special  cases 
considered  in  Section  3.2,  *u  =  =  0,  and  computer  algebraic  manipulation  showed  that  as(u) 

reduces  to  the  simple  form 

as(u)  =  {ut2uj2  -  ^(ui2+«i2)}po  • 

The  criterion  given  in  Theorem  2  also  reduces  to  a  simple  form  when  expressed  in  terms  of  %. 
We  have,  by  standard  calculus, 

a*(X)  =  Dva(X)  -  X.Vatf)  =  V^Y)  -  Y.Va*(Y)  =  as*(Y),  say 

where  as*(u)  =  (2(l+*22)(ut2+U22)  -  4kn  -  4a!2u22}|3o. 

Since,  by  definition,  a(X)  =  as(Y),  it  follows  that  * 

AM(F0)  =  cov{n(X).a*(X)}  *  cov{ns{y),as*(y)}  =  E{as(Y)  +  % *a}«s*(Y)  (3.7) 

since  it  is  immediate  that  Eas(Y)  = 

Suppose,  now,  that  the  distribution  of  Y  is  radially  symmetric,  so  that  Yr  =  (R  cos  0,  R  sin  8) 
with  0  uniformly  distributed  on  (0,2ir).  The  form  (3.7)  for  Aa(FQ)  can  be  expressed  in  terms  of  even 
moments  of  Y  up  to  order  8,  and  each  of  these  moments  can  be  expressed  in  terms  of  the  moments  of 
R1.  For  example 

Jfca  =  £R4sin29  cos2©  =  ER*li  *  (s2  +  4>8 

where,  as  in  Section  3.2,  Sj  -  E(F2  -  2y  is  the  jth  central  moment  of  F2*,  the  assumption  chat 
£Y,  1  =  EY22  =  1  implies  that  R1  has  mean  2.  Performing  all  these  substitutions,  by  computer  algebra, 
yields  the  form  (3.5)  for  Aa(F0)  and  hence  the  conclusion  given  in  Section  3.2  for  distributions  that 
can  be  transformed  to  radial  symmetry. 

Now  suppose  that  Yt  and  Y2  are  independent,  but  that  Y  is  not  necessarily  radially  symmetric.  It 
will  then  be  the  case  that  *n  =  EY\  1EY22  =■  1  and  hence 

aj*(«)  =  -4p0(ul2B12-u,2-U22+l)  =  -4{as(“)+Po}- 

It  follows  that  Asj(Fo)  *  ~4  var  aj(Y).  Since  Yi  and  Y2  are  independent,  the  only  way  vanij(Y)  can 
be  zero  is  for  Y  to  have  the  four  point  distribution  giving  probability  V*  to  each  of  the  points  (±1,±1); 
otherwise  as(Y)  has  positive  variance,  and  Ao(F0)  <  0. 

4.  SIMULATION  STUDY 

The  discussion  in  Section  3  above  involved  heavy  dependence  on  two  approximations,  one  of 
them  specific  to  the  example  under  consideration  and  the  other  a  key  feature  of  our  proposed  general 
methodology.  In  this  section,  we  investigate  both  of  these  approximations  by  a  simulation  study  which 
extends  the  one  carried  out  by  Efron  (1982,  Table  5.2).  All  our  simulations  are  earned  out  under  the 
assumptions  of  Efron’s  model,  that  F0  is  the  bivariate  normal  distribution  with  unit  marginal  variances 
and  correlation  ‘/i.  Efron  considered  only  samples  of  size  14,  though  we  consider  here  larger  sample 
sizes  as  well  We  follow  Efron  in  using  the  values  0  and  'A  for  the  smoothing  parameter  k. 

For  each  sample  size  n,  the  accuracy  of  the  bootstrap  and  smoothed  bootstrap  estimates  of  the 
sampling  standard  deviation  oc.(Fo)  of  the  variance-stabilized  correlation  coefficient  was  assessed  in 
three  different  ways.  Firstly,  a  direct  simulation  of  the  bootstrap  procedure  itself  was  carried  out  two 
hundred  datasets  were  generated  from  Fa  and  for  each  one  o.(F0)  was  estimated  by  the  usual 
resampling  procedure,  using  two  hundred  resampled  datasets  of  size  n  in  each  case.  The  true  value  of 
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a.(F0)  is  known  and  so  it  is  possible  to  estimate  the  root  mean  square  error  of  the  direct  bootstrap 
procedures  from  our  simulations.  The  results  thus  obtained  are  labelled  "direct"  in  Table  1. 


> 

V 

V 

V 

V 


n 

h 

Variance-stabilized 
direct  linear  delta 

Untransformed 
direct  linear  delta 

14 

0 

0.075 

0.071 

0.077 

0.070 

0.076 

0.060 

Vi 

0.045 

0.046 

0.057 

0.057 

0.0S5 

0.052 

20 

0 

0.049 

0.050 

0.053 

0.046 

0.053 

0.044 

Vi 

0.033 

0.032 

0.037 

0.045 

0.039 

0.041 

30 

0 

0.029 

0.033 

0.033 

0.033 

0.036 

0.030 

Vi 

0.019 

0.021 

0.022 

0.027 

0.026 

0.027 

40 

0 

0.024 

0.025 

0.025 

0.024 

0.027 

0.027 

Vi 

0.015 

0.016 

0.017 

0.021 

0.019 

0.020 

50 

0 

0.020 

0.020 

0.021 

0.020 

0.021 

0.019 

Vi 

0.013 

0.013 

0.014 

0.019 

0.015 

0.018 

100 

0 

0.011 

0.010 

0.010 

0.010 

0.011 

0.010 

Vi 

0.008 

0.006 

0.007 

0.009 

0.008 

0.008 

Table  1:  Estimates  of  root  mean  square  errors  of  bootstrap  estimates  of  sampling  standard  deviations  of 
variance-stabilized  and  un transformed  correlation  coefficients.  Sample  sizes  n  and  smoothing 
parameters  h. 


Secondly,  in  order  so  investigate  die  accuracy  of  our  linear  approximation  some  analytic 

calculations  were  carried  out  making  use  of  computer  algebra.  By  this  means,  the  behaviour  of  the 
approximation  can  be  studied  without  recourse  to  any  simulation.  For  the  bivariate  normal  population 
under  consideration,  the  standard  deviation  of  Akj(Fa)  was  found  to  be  «"*(1+A2)"2.  This  quantity  is 
referred  to  as  the  "linear”  estimate  of  the  root  mean  square  error  of  the  bootstrap  procedure.  Closeness 
of  the  "linear”  and  ’direct”  estimates  of  root  mean  square  error  would  vindicate  our  proposed  procedure 
of  studying  the  sampling  properties  of  the  bootstrap  by  means  of  linear  approximations. 

Our  development  of  the  linear  approximation  involved  the  intermediate  step  of  approximating 
a,(F0)  by  n-vta(Fo),  as  given  in  (3.3).  This  intermediate  approximation  raises  (he  possibility  of 
studying  the  sampling  properties  of  the  smoothed  bootstrap  by  considering  those  of  the  approximation 
(3.3),  with  F0  replaced  by  This  corresponds  to  substituting  the  moments  of  A,*.  which  are  easily 
calculated  from  die  sample,  into  (3.3).  By  analogy  with  Section  6J  of  Efron  (1982),  we  refer  to  this 
procedure  as  the  non-parametric  delta  approximation  to  the  smoothed  bootstrap.  For  each  of  two 
hundred  simulated  samples  from  F0  this  approximation  was  calculated.  From  the  values  thus  obtained, 
a  third  estimate  of  the  root  mean  square  error  of  the  smoothed  bootstrap  procedure  was  found.  This  is 
labelled  "delta"  in  Table  1. 

The  analogous  investigation  was  carried  out  for  the  untransformed  correlation  coefficient  r,  in  the 
context  of  the  same  bivariate  normal  modeL  The  factor  (1-p2)-2  is  omitted  from  (3.3)  in  this  case; 
otherwise  the  same  algebraic  manipulations  and  simulations  were  performed  as  for  the  variance- 
stabilized  coefficient  z.  The  ’linear”  estimate  of  the  root  mean  square  error  is  now 
%/i“l(l+A2r2(2+2A2+A4)Vi.  The  results  are  presented  in  the  last  three  columns  of  Table  1. 

The  broad  conclusions  to  be  drawn  from  Table  1  are  the  same  for  both  correlation  coefficients. 
Even  for  the  small  sample  size  considered  by  Efron  (1982),  our  linear  approximation  procedure  gives 
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good  estimates  of  the  accuracy  of  the  full  bootstrap  procedure,  and  the  relative  improvement  due  to 
smoothing  is  well  predicted.  Efron’s  conclusions  could  have  been  obtained  without  recourse  to  any 
simulation.  On  the  whole  the  delta  procedure,  which  itself  involves  some  simulation,  gives  slightly 
inferior  estimates  of  the  bootstrap’s  accuracy. 

It  is  known  (Davison,  Hinkley  and  Schechtman,  1986)  that  the  variance-stabilized  correlation 
coefficient  is  highly  correlated  with  its  linear  approximation,  but  the  untransformed  correlation 
coefficient  is  in  general  not  The  suspicion  expressed  by  a  referee  that  this  may  have  a  deleterious 
effect  on  our  approximations  in  the  untransformed  case  does  not  appear  to  have  been  borne  out  by  the 
simulation  study,  except  that  the  beneficial  effects  of  smoothing  the  bootstrap  were  systematically 
slightly  exaggerated  by  the  linear  method  in  this  case. 
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